-
Notifications
You must be signed in to change notification settings - Fork 18
Support postprocessing of instances #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support postprocessing of instances #116
Conversation
b7190ff to
dc31070
Compare
|
PR ready for review |
| print("simexpal: Restoring original instance files") | ||
| os.rename(partial_path + '.post0', partial_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, if we restore the instance file to the post0 file, how do we know if any given instance has already been postprocessed or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old logic here was: the post0 files are obtained through download, and the instance files are always already postprocessed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, if we restore the instance file to the
post0file, how do we know if any given instance has already been postprocessed or not?
For each instance we create a .postprocessed file (lines 781 and 868) when the postprocessing was successful. Also, we print out status messages in the terminal. Although we probably shouldn't assume that it is enough. Maybe I could add an additional check in Instance.check_available() which looks for the .postprocessed file for instances that are supposed to be postprocessed. In this way, experiments can't be started on unprocessed instances that were supposed to be postprocessed.
The old logic here was: the
post0files are obtained through download, and the instance files are always already postprocessed, right?
Yes. With this PR .post0 files are still obtained through download and instance files are postprocessed depending on the existence of the .postprocessed file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning .postprocessed and Instance.check_available(): I think it is enough to inform the user when postprocessing has not been successful. Making successful postprocessing mandatory might lead to unexpected results for users (for example if the experiment does not really rely on it in practice).
…en postprocessing instances The .postprocessed file signals the successful postprocessing of an instance. The .original file is a copy of the instances before postprocessing
… types Also 'to_edgelist' was removed from instances where we do not support edge list conversion anyway
Right now it should suffice to check the existence of the files, which is done in check_available()
1dba381 to
a83a4b2
Compare
|
Fixed the merge conflicts for now. When the remarks above are resolved or ok with you I will mark this PR as ready for review again. |
| ^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| You can define arbitrary postprocessing steps by setting the ``postprocess`` key to a list of dictionaries | ||
| containing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... to a list of dictionaries containing the keys:
| - ``environ``: dictionary of (environment variable, value)-pairs | ||
| - ``workdir``: path of the working directory | ||
|
|
||
| keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this line then.
|
|
||
| Assume you want to postprocess the ``facebook_combined`` and ``cit-HepTh`` network from | ||
| `SNAP <https://snap.stanford.edu/data/>`_ using two executables ``postprocess1`` and ``postprocess2``, which | ||
| take the path of the instance as parameter. Also, you have to prepend the path for ``postprocess1`` to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does only postprocess1 needs the PATH variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to demonstrate the possibility/necessity of adding the appropriate PATH-variable in simexpal in order to find the executable for the postprocessing. There might be cases where the PATH was added beforehand, so that is isn't necessary to do so anymore.
| Re-Postprocessing | ||
| ^^^^^^^^^^^^^^^^^ | ||
|
|
||
| To re-postprocess instances, you can simply delete the respective ``<instance_name>.postprocessed`` files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better name/reference one more time that instances are saved in inst_dir.
| ``simex instances run-transform --transform='to_edgelist'`` if you already have them saved locally. | ||
| execute supported actions, e.g, transforming the instances to edge list format via | ||
| ``simex instances run-transform --transform='to_edgelist'`` or :ref:`postprocess <PostprocessInstances>` | ||
| them if you already have them saved locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put if you already have them saved locally to the beginning of the sentence. Otherwise it currently reads like: postprocessing is available if the instances are locally available (while the transform works all the time).
| print("simexpal: Restoring original instance files") | ||
| os.rename(partial_path + '.post0', partial_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning .postprocessed and Instance.check_available(): I think it is enough to inform the user when postprocessing has not been successful. Making successful postprocessing mandatory might lead to unexpected results for users (for example if the experiment does not really rely on it in practice).
| fullpath = instance.fullpath | ||
| util.try_rmfile(fullpath) | ||
| util.try_rmfile(fullpath + '.postprocessed') | ||
| util.try_rmfile(fullpath + '.original') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is with .post0 and .post1?
This PR contains the following:
args,workdirandenvironkeys in thepostprocesskey (like the build arguments incompile,configure, ...)assert file_size > 0while testing the installation process (so we can have empty (dummy) instances; also it should suffice to only check for the existence of instance files)